KL Divergence

Also called “information gain", is a measure of the difference between two probability distributions P and Q. It is not symmetric and does not obey the triangle inequality, thus is not a true metric.
KL divergence from Q to P:

DKL(PQ)=p(x)logp(x)q(x)dx=Ep[logp(x)q(x)]

In information theory, DKL(PQ)

Proof for DKL(QP)0 (also for DKL(PQ)0) :

0=log1=logp(x)dx=logp(x)q(x)q(x)dxq(x)logp(x)q(x)=Eq[logp(x)q(x)]=DKL(QP)

Due to Jensen's inequality:
f(E[x])E[f(x)], if f is concave

Note that DKL(QP)=0 iff q(x)=p(x).

If P represents the "true" distribution of data, observations, or a precisely calculated theoretical distribution, while Q represents a theory, model, description, or approximation of P:

Reference

Introduction to variational Bayesian methods: https://www.youtube.com/watch?v=HOkkr4jXQVg
KL Divergence: https://en.wikipedia.org/wiki/Kullback–Leibler_divergence